Chartjunk, Deception, and Chicanery

PH345: Winter 2025

Phil Boonstra

Graphical Integrity

  1. Representation of numbers should equal quantities represented
  2. Use clear, detailed, and thorough labeling, especially if there is risk of distortion or ambiguity
  3. Show data variation, not design variation
  4. In time-series displays of money, using inflation-adjusted units
  5. Number of dimensions (encodings) should not exceed number of datapoints
  6. Don’t quote data out of context

Representation of numbers should equal quantities represented

https://www.vox.com/2014/8/20/6040435/als-ice-bucket-challenge-and-why-we-give-to-charity-donate

Representation of numbers should equal quantities represented

What the plot has: \(x/y = 257.87/147\) (numbers proportional to diameter or radius)

What the plot should have: \(x^2/y^2 = 257.87/147\) (numbers proportional to area)

https://www.vox.com/2014/8/20/6040435/als-ice-bucket-challenge-and-why-we-give-to-charity-donate

Representation of numbers should equal quantities represented

Demonstration of how dynamite plots do not give an accurate representation of the data’s distribution; A and C show dynamite plots; B and D show ‘beeswarm’ plots, a type of univariate scatter plot; A and B both represent the same dataset—‘Dataset 1’, and C and D represent another—‘Dataset 2’; the dynamite plots A and C are identical, even though the Dataset 1 and Dataset 2 are vastly different; B and D give a good representation of the two different datasets, allowing the reader to note that although these datasets have the same mean and standard error, they have vastly different distributions

Figure 1, Dogget and Way, et al. (2024)

Use clear, detailed, and thorough labeling, especially if there is risk of distortion or ambiguity

How did the number of gun deaths change after introduction of Stand Your Ground law?

https://www.heap.io/blog/how-to-lie-with-data-visualization https://visualisingdata.com/2014/04/the-fine-line-between-confusion-and-deception/

Show data variation, not design variation

  • Numbers on left-hand side are uninformative because each survey has different sample size

  • Cartoon people are redundant

  • Proportions would be impossible to see without annotations

  • Difficult to compare across years

Obese people in the Canary Islands in 2004, 2009 and 2015. Pink area shows the proportion of people who are obese, while grey area is related to non-obese people. The percentages refer to the total number of people of their respective group.

Figure 1, Hernández-Yumar, et al. (2019)

Show data variation, not design variation

Size of cartoon women is growing proportional to height

See also: Representation of numbers should equal quantities represented

https://x.com/reina_sabah/status/1291509085855260672

Show data variation, not design variation

Classification of [Transcription factor binding sites] Regions

Figure 1, Cawley, et al. (2004)

In time-series displays of money, using inflation-adjusted units

Year-by-year changes in core funding in the UK relative to year 2010. Not adjusted for inflation

https://www.ft.com/content/bc19bbf4-2939-489e-a113-e21d5baf356d

In time-series displays of money, using inflation-adjusted units

Year-by-year changes in core funding in the UK relative to year 2010, after adjusting for inflation

https://www.ft.com/content/bc19bbf4-2939-489e-a113-e21d5baf356d

In time-series displays of money, using inflation-adjusted units

Year-by-year changes in core funding per capita in the UK relative to year 2010, after adjusting for inflation

https://www.ft.com/content/bc19bbf4-2939-489e-a113-e21d5baf356d

Number of dimensions (encodings) should not exceed number of datapoints

Empirical coverage of CIs for the relative-risk parameter b of haplotype 01100. Results are based on 10,000 simulated data sets with the same haplotype frequencies as the FUSION data.

Figure 1, Epstein and Satten (2003)

Number of dimensions (encodings) should not exceed number of datapoints

Departure from Hardy-Weinberg equilibrium under additive model (top) and multiplicative model (bottom). The authors note: “For a multiplicative model, [DHW] is equal to 0.”

Figure 1C and 1D, Wittke-Thompson, et al. (2005)

Kaiser Fung

Data visualization expert

Author of “Numbersense” and “Numbers Rule Your World”

Writes Junk Charts blog

https://junkcharts.typepad.com/junk_charts/2024/12/the-wtf-moment.html

https://junkcharts.typepad.com/junk_charts/2024/12/the-wtf-moment.html

Giannarelli, et al. (2023)

Headline: “Massive increase in costs of welfare programs if fully utilized”

# hint 1: use the `labels = scales::dollar` argument in `scale_y_continuous` to format the y-axis as dollars
# hint 2: if you want to 'zoom in' on a plot, *don't* use the `limits` argument
# in `scale_y_continuous`. Doing so will actually drop the data and potentially
# change the plot itself. Use `coord_cartesian` instead

Code Together Task

No Spice: Make an approximate version of the bar chart on slide 20

Weak Sauce: No menu options today…

Medium Spice: Make an approximate version of the misleading bar chart on slide 19

Yoga Flame: No menu options today…

Dim Mak: Make an exact replicate of the misleading bar chart on slide 19. I’m looking for perfection!

References

Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A.J. and Wheeler, R., 2004. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell, 116(4), pp.499-509.

Doggett, T.J. and Way, C., 2024. Dynamite plots in surgical research over 10 years: a meta-study using machine-learning analysis. Postgraduate Medical Journal, 100(1182), pp.262-266.

Epstein, M.P. and Satten, G.A., 2003. Inference on haplotype effects in case-control studies using unphased genotype data. The American Journal of Human Genetics, 73(6), pp.1316-1329.

Giannarelli, L., Minton, S., Wheaton, L. and Knowles, S., 2023. A Safety Net with 100 Percent Participation: How Much Would Benefits Increase and Poverty Decline?. Washington, DC: The Urban Institute.

Hernández-Yumar, A., Abásolo Alessón, I. and González López-Valcárcel, B., 2019. Economic crisis and obesity in the Canary Islands: an exploratory study through the relationship between body mass index and educational level. BMC Public Health, 19, pp.1-9.

Wittke-Thompson, J.K., Pluzhnikov, A. and Cox, N.J., 2005. Rational inferences about departures from Hardy-Weinberg equilibrium. The American Journal of Human Genetics, 76(6), pp.967-986.